37 research outputs found

    Evaluating Overfit and Underfit in Models of Network Community Structure

    Full text link
    A common data mining task on networks is community detection, which seeks an unsupervised decomposition of a network into structural groups based on statistical regularities in the network's connectivity. Although many methods exist, the No Free Lunch theorem for community detection implies that each makes some kind of tradeoff, and no algorithm can be optimal on all inputs. Thus, different algorithms will over or underfit on different inputs, finding more, fewer, or just different communities than is optimal, and evaluation methods that use a metadata partition as a ground truth will produce misleading conclusions about general accuracy. Here, we present a broad evaluation of over and underfitting in community detection, comparing the behavior of 16 state-of-the-art community detection algorithms on a novel and structurally diverse corpus of 406 real-world networks. We find that (i) algorithms vary widely both in the number of communities they find and in their corresponding composition, given the same input, (ii) algorithms can be clustered into distinct high-level groups based on similarities of their outputs on real-world networks, and (iii) these differences induce wide variation in accuracy on link prediction and link description tasks. We introduce a new diagnostic for evaluating overfitting and underfitting in practice, and use it to roughly divide community detection methods into general and specialized learning algorithms. Across methods and inputs, Bayesian techniques based on the stochastic block model and a minimum description length approach to regularization represent the best general learning approach, but can be outperformed under specific circumstances. These results introduce both a theoretically principled approach to evaluate over and underfitting in models of network community structure and a realistic benchmark by which new methods may be evaluated and compared.Comment: 22 pages, 13 figures, 3 table

    Altmetrics Study On Research Outputs In Fields of Social Sciences In Top Iranian Universities

    Get PDF
    Purpose: The purpose of the present work was Altmetrics study of research outputs in the field of social and behavioral sciences in major Iranian universities during 2010-2020. Methodology: The research outputs of the thematic domains of social and behavior of sciences major Iranian universities indexed in the Scopus database were reviewed. This applied research was conducted with a Altmetrics approach. Scopus and Altmetrics Explorer databases were used to collect data. Data analysis was performed using descriptive and inferential statistical tests in Excel software. Findings: Current study revealed Shahid Beheshti, Tehran, Tarbiat Modares, Tabriz, and Shiraz universities, in the field of social sciences, had the most ranks in items of Mentions and Bookmarks. In addition, in all the universities surveyed, the most mentions were on Twitter and the most bookmarks were on Mendeley. Conclusion: Overall, the findings showed that most of the surveyed universities were not in an acceptable position in terms of social media presence and Altmetrics score, indicating the lack of familiarity of the corresponding researchers with the benefits of social media and their low participation in sharing their research outputs on social media

    Detectability thresholds and optimal algorithms for community structure in dynamic networks

    Get PDF
    We study the fundamental limits on learning latent community structure in dynamic networks. Specifically, we study dynamic stochastic block models where nodes change their community membership over time, but where edges are generated independently at each time step. In this setting (which is a special case of several existing models), we are able to derive the detectability threshold exactly, as a function of the rate of change and the strength of the communities. Below this threshold, we claim that no algorithm can identify the communities better than chance. We then give two algorithms that are optimal in the sense that they succeed all the way down to this limit. The first uses belief propagation (BP), which gives asymptotically optimal accuracy, and the second is a fast spectral clustering algorithm, based on linearizing the BP equations. We verify our analytic and algorithmic results via numerical simulation, and close with a brief discussion of extensions and open questions.Comment: 9 pages, 3 figure

    new caerin-like antibacterial peptide from the venom gland of the Iranian scorpion Mesobuthus eupeus: cDNA amplification and sequence analysis

    Get PDF
    Scorpion venom consists of different types of peptides and proteins which are encoded by individual genes. A full length cDNA consisting of 238 base pair nucleotides and encoding 74 amino acids peptide was isolated from the venom gland of the Iranian scorpion Mesobuthus eupeus (Buthidae family). This peptide named M. eupeus caerin-like antimicrobial peptide (Me-CLAP) belonging to the group of antibacterial peptide was previously described from scorpion. In this study, sequence of cDNA encoding Me-CLAP from the M. eupeus venom glands was amplified using reverse transcriptase polymerase chain reaction (RT-PCR) and was analyzed afterwards. Me-CLAP has similar molecular characteristics to antimicrobial peptides (AMPs) of same genus like Mesobuthus martensii and M. eupeus and more differences were seen with other genus.Keywords: Caerin-like antimicrobial peptide, Mesobuthus eupeus, semi-nested real-time polymerase chain reaction

    Evaluating the scale, growth, and origins of right-wing echo chambers on YouTube

    Full text link
    Although it is understudied relative to other social media platforms, YouTube is arguably the largest and most engaging online media consumption platform in the world. Recently, YouTube's outsize influence has sparked concerns that its recommendation algorithm systematically directs users to radical right-wing content. Here we investigate these concerns with large scale longitudinal data of individuals' browsing behavior spanning January 2016 through December 2019. Consistent with previous work, we find that political news content accounts for a relatively small fraction (11%) of consumption on YouTube, and is dominated by mainstream and largely centrist sources. However, we also find evidence for a small but growing "echo chamber" of far-right content consumption. Users in this community show higher engagement and greater "stickiness" than users who consume any other category of content. Moreover, YouTube accounts for an increasing fraction of these users' overall online news consumption. Finally, while the size, intensity, and growth of this echo chamber present real concerns, we find no evidence that they are caused by YouTube recommendations. Rather, consumption of radical content on YouTube appears to reflect broader patterns of news consumption across the web. Our results emphasize the importance of measuring consumption directly rather than inferring it from recommendations.Comment: 29 pages, 21 figures, 15 table

    Causally estimating the effect of YouTube's recommender system using counterfactual bots

    Full text link
    In recent years, critics of online platforms have raised concerns about the ability of recommendation algorithms to amplify problematic content, with potentially radicalizing consequences. However, attempts to evaluate the effect of recommenders have suffered from a lack of appropriate counterfactuals -- what a user would have viewed in the absence of algorithmic recommendations -- and hence cannot disentangle the effects of the algorithm from a user's intentions. Here we propose a method that we call "counterfactual bots" to causally estimate the role of algorithmic recommendations on the consumption of highly partisan content. By comparing bots that replicate real users' consumption patterns with "counterfactual" bots that follow rule-based trajectories, we show that, on average, relying exclusively on the recommender results in less partisan consumption, where the effect is most pronounced for heavy partisan consumers. Following a similar method, we also show that if partisan consumers switch to moderate content, YouTube's sidebar recommender "forgets" their partisan preference within roughly 30 videos regardless of their prior history, while homepage recommendations shift more gradually towards moderate content. Overall, our findings indicate that, at least on YouTube, individual consumption patterns mostly reflect individual preferences, where algorithmic recommendations play, if anything, a moderating role

    Environmental Impact Assessment of the Industrial Estate Development Plan with the Geographical Information System and Matrix Methods

    Get PDF
    Background. The purpose of this study is environmental impact assessment of the industrial estate development planning. Methods. This cross-sectional study was conducted in 2010 in Isfahan province, Iran. GIS and matrix methods were applied. Data analysis was done to identify the current situation of the region, zoning vulnerable areas, and scoping the region. Quantitative evaluation was done by using matrix of Wooten and Rau. Results. The net score for impact of industrial units operation on air quality of the project area was (−3). According to the transition of industrial estate pollutants, residential places located in the radius of 2500 meters of the city were expected to be affected more. The net score for impact of construction of industrial units on plant species of the project area was (−2). Environmental protected areas were not affected by the air and soil pollutants because of their distance from industrial estate. Conclusion. Positive effects of project activities outweigh the drawbacks and the sum scores allocated to the project activities on environmental factor was (+37). Totally it does not have detrimental effects on the environment and residential neighborhood. EIA should be considered as an anticipatory, participatory environmental management tool before determining a plan application
    corecore